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TN THE IINITRD STATES PATENT AND TRADEMARK OFFICE 

Attorney Docket No. KAK-004 

In re patent application of 
Hiroaki KITANO et al. 

Serial No. 10/018,571 Group Art Unit: unassigned 

Filed: December 20, 2001 Examiner: unassigned 

Title: METHOD AND DEVICE FOR NETWORK INFERENCE 

Commissioner of Patents and Trademarks 
Box DAC 

Washington, D.C. 20231 

PRELIMI N ARY A ME ND ME NT 

Sir: 

Prior to the initial examination, please amend the above-identified application as 
follows: 

IN THE CLAIMS: 

Please rewrite claims 1,9, 10 and 11 as set forth below in clean form: 
1. (Amended) A network estimation method, which using at least one independent, data- 
generating element estimates candidate networks that are descriptive of relationships between 
interrelated elements as a network and that, when data generated by said elements from said 
network is given, are capable of reproducing data from said data given; said method 
comprising the steps of: 

obtaining a plurality of candidate networks by producing a network structure, which 
may allow for reproduction of said data given, and a corresponding parameter set; and 

narrowing down an appropriate candidate network from said obtained networks, using 

i 



data different from said obtained data and generated from a network where a portion of the 
network has mutated. 

9. (Deleted) 

10. (Amended) A network estimation apparatus, which using at least one independent, data- 
generating element estimates candidate networks that are descriptive of relationships between 
interrelated elements as a network and that, when data generated from said network is given, 
are capable of reproducing said data from said data given; said network estimation apparatus 
comprising: 

first memory means for storing a network to be configured by a network structure and 
corresponding parameter set; 

second memory means for storing a network structure and corresponding parameter set, 
as a final candidate; 

means for producing a network structure, which may allow for reproduction of said 
data given, and corresponding parameter set, and obtaining and storing in said first memory 
means a plurality of candidate networks; and 

means for narrowing down and storing in said second memory means an appropriate 
candidate network from networks stored in said first memory means using data different from 
said obtained data and generated from a network where a portion of the network has mutated. 

11. (Amended) Recording media, stored with a program capable of providing a network 
estimation method mentioned in any one of Claims 1 through 8 for a computer system. 



2 



ii .ii n i" $ -n «e it F% «JL.t:'» ic o p> 

" J -Tu TjuI 7 ' Tuir 1»I<>1 'VlilP' nu i 1 " s*" „ji3i> JI5 -£ii»P- il ili-uj KluJ*' H.ir CBjijjs 



REMARKS 



This Preliminary Amendment includes the Amendment under PCT Article 34. 
Additionally, in accordance with 37 CFR § 1.121(c))(l)(ii), amended claims 1, 9, 10 and 11 
are set forth in a marked-up version in the pages attached hereto as APPENDIX. 

If the Examiner has any suggestions for placing this application in even better form, the 
Examiner is invited to telephone the undersigned and the number listed below. 



RADER, FISHMAN & GRAUER, PLLC 

The Lion Building 

1233 20* Street, N.W., Suite 501 

Washington, D.C. 20036 

Tel: (202) 955-3750 

Fax: (202) 955-3751 

Customer No. 23353 



Respectfully submitted, 




Date: April 25, 2002 



David K. Benson 



Reg. No. 42,314 
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APPKNDIX 

VERSION WITH MARKINGS TO S HOW CHANGES MADE 

Tn the Claims: 

Claims has been amended as follows: 

1 . A network estimation method, which using at least one independent, date-generating 
elements candidate networks that are descriptive of relationships between interrelated elements 
as a network and that, when data generated by said elements is given, are capable of 
reproducing data from said data given; said method comprising the steps of: 

obtaining a plurality of candidate networks by producing a network structure, which 
may allow for reproduction of said data given, and corresponding parameter set; and 

narrowing down an appropriate candidate network from said obtained networks using 
[other data that differs] data different from said obtained data and that can be generated from 
[said networks] network where a portion of the network has mutated - 

9. [The network estimation method mentioned in any one of Claims 1 through 8, wherein 
said other data is data generated by a network, a portion of which is mutant.] 

10. A network estimation apparatus, which using at le ast one independent, data-generating 
element estimates candidate networks that are descriptive of relationships between interrelated 
elements as a network and that, when data generated [by] from said elements is given, are 
capable of reproducing said data from said data given; said network estimation apparatus 
comprising: 

4 
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first memory means for storing a network that configures a network structure and 
corresponding parameter set; 

second memory means for storing a network structure and corresponding parameter set, 
as a final candidate; 

means for producing a network structure, which may allow for reproduction of said 
data given, and corresponding parameter set, obtaining and storing in said first memory means 
a plurality of candidate networks; and 

means for narrowing down and storing in said second memory means an appropriate 
candidate network from networks stored in said first memory means using [other] data 
different from said obtained data [that can be] and generated from a network [and that differs 
from said obtained data] where a portion of th e network has mutated. 

11. Recording media, stored with a program capable of providing a network estimation 
method mentioned in any one of Claims 1 through [9] & for a computer system. 
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DESCRIPTION 
NETWORK ESTIMATION METHOD AND APPARATUS 



Technical Field 

5 The present invention relates to inverse estimation of a system that can be 

represented with a network model comprised of interfering discrete elements related to 
each other, from information representing the behavior and status of such network. In 
particular, it is related to estimation, from gene expression data, protein concentration 
data, and related data occurring in biological phenomena, of the structure, etc. of a genetic 

10 control network bringing about that biological phenomenon and of a metabolic network 
showing reactions of enzymes and proteins. 

Background Art 

Due to recent rapid advances in molecular biology, much information concerning 
15 major organisms is continuously being amassed. Classic examples include yeasts, 
nematodes, and fruit flies. For example, with nematodes (C. elegans), estimation of their 
entire cell lineage and identification of bonds in their nervous system has already been 
carried out, with the determination of the entire DNAbase sequence being announced last 
year. The entire DNA base sequence of model organisms such as yeasts, E. coll and 
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mycoplasma, and even human has also already been determined. Nevertheless, there is a 
large difference between knowing the DNA sequence, and understanding each gene, its 
function, and the interactions between genes. 

Determination of such genetic functions and interactions between genes is an 
5 exceeding difficult task. A great deal of the current research in molecular biology is 
placing emphasis on locating genes related to noteworthy phenomena and the products of 
their transcription, and determination of their cascade regulation. At the stage where 
comprehension of the interaction between such genes takes place, understanding of 
complex interactions using only human intuition is extremely difficult. This is because of 

10 the large volume of data and various relationships between the interactions related to this 
that can be considered. We are attempting to overcome this difficulty by introducing a 
tool having computational ability. 

This theory is expressed schematically in FIG. 1. This diagram shows the 
relationship between the portion using a computer (virtual portion), and the portion 

15 actually biologically tested (real portion). In FIG. 1, first, once the phenomena to be 
investigated is specified, based on currently known biological information (FIG. 1 PHASE I 
A), a model is loaded into a computer (FIG. 1 PHASE I B). In addition, in cases where a 
hypothesis has already been presented, that hypothesis is loaded. Using the loaded 
model, a simulation is performed (FIG. 1 PHASE I C), and its consistency with observed 
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data is subjected to verification (FIG. 1 PHASE I D). When the simulation results and 
observed data do not match, two possibilities can be considered. First, the simulation 
may not have been exact. This can be solved by improving the accuracy of the simulation 
and making it more reliable. The next thing to be considered is that the model may be 
5 incomplete or incorrect. In this case, it is necessary to rebuild a model consistent with 
known test data. 

Afterwards, if possible, computer testing is performed using this model (FIG. 1 
PHASE II F). Actual biological testing is then begun after Phase II, or directly if Phase II 
is not possible, to see if the hypothetically decided models are correct (FIG. 1 PHASE III 
10 and IV). 

In this manner "reverse engineering of biological systems" can be carried out. 

Fields where it is necessary to narrow down hypothetically found models using 
this sort of computerized simulation, are not limited to fields of the above-mentioned 
organisms. 

15 

SUMMARY OF THE INVENTION 

The object of the present invention is to efficiently narrow down hypothetically 
found models using a computer. 

For example in the field of molecular biology, it lies in using a computer to 
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estimate, from such as data related to changes over time in the amount of genetic material 
or protein obtained using DNA chips, microarrays, or PCR, etc., and data relating to 
activity of neural networks obtained using such as electrophysiological testing or voltage 
sensitive dye, a model represented with a network with interactions of genes, enzymes, or 
5 proteins, and a neural network. The total number of models that can be considered from 
the given data is tremendous. This has conventionally been achieved by using human 
intuition to make guesses, evaluate them, and then carry out biological testing for 
confirmation of their validity. The present invention tries to estimate networks that may 
allow for generation of similar data, from given data by using a computer, in order to 
10 measurably narrow down the number of reliable networks into a smaller number of 
networks. In the case shown in the above-mentioned FIG. 1, this correlates to Phase I. 

Estimation of these models of interaction (network estimation) is necessary for 
identifying genetic diseases and estimating the effects of medications. Furthermore, this 
can also be used to determine unknown genes and gene products. 
15 l n order to fulfill the above objective, the present invention is a method of 

estimating candidate networks that are descriptive of relationships between interrelated 
elements as a network and that, when data generated by said elements is given, allow for 
reproduction of data from the given data; said method is characterized by comprising the 
steps of producing a network structure and corresponding parameter set that may allow 
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for reproduction of said data, and obtaining a plurality of candidate networks; and 
narrowing down an appropriate candidate network from said obtained networks using 
other data that can be generated from a network and that differs from said obtained data. 

This allows inverse estimation of a system that can be represented with a network 
5 model comprised of interfering discrete elements related to each other, from data 
representing the behavior, status, product, etc. of such network. 

When the number of network structures capable of producing said given data is 
small, it is possible for said step of obtaining candidate networks to include a step of 
producing all of those network structures. 
!0 Said step of obtaining candidate networks may also include a step of selecting 

network structures capable of reproducing said given data. This is performed in cases 
where there exist many networks including the parameters. Random selection of network 
structures, for example, can be performed for this selection step. 

In cases where selection is performed in this manner, it is possible to further 
15 include a step of producing with frequency network structures which reproduce data 
having a low margin of error with said given data. 

This increases the probability of selecting networks having a high likelihood of 
reproducing the given data. 

Furthermore, it is possible to include a step of reorganizing a portion of the 



6 

networks, which reproduce data having a low margin of error with said given data, so as to 
produce network structures. This can be achieved, for example, by using genetic 
algorithm (GA). 

These network structures are obtained through a coarse, global search of the 
5 network structure space. 

To supplement this global search, it is possible to include, as a search to obtain a 
network structure, a step of searching the structural vicinities of an arbitrary network and 
producing a network structure. Simulated annealing (SA), for example, may be used for 
this local search. 

10 By utilizing such a hybridization of a global search and a local search of network 

structures, it is possible to efficiently select an optimum network structure. 

Said step of obtaining candidate networks estimates one parameter or a set of a 
plurality of parameters different from one another for identical network structures, and for 
this, estimation is performed using genetic algorithm (GA), simulated annealing (SA), 
15 and/or an optimization technique such as the hill-climbing method. 

Through this, it is possible to speed up estimation of optimum network structure 
parameters for reproduction of the given data. 

In order to narrow down candidate networks obtained in this manner, data other 
than the given data is used; however, data generated by a network wherein a portion of the 



network is mutant may be used as this data. 

Thus, in the end, a comparatively small number of optimized candidate networks 

can be efficiently obtained. 

The present invention also includes an apparatus for executing the above- 
5 mentioned network estimation method and recording media stored with a program which 
allows a computer system to execute the network estimation method. 



BRIEF DESCRIPTION OF DRAWINGS 

FIG. 1 is a diagram for describing computer use in molecular biology. 
10 FIG. 2 is diagrams showing an example of a genetic network; (a) is an exemplary view of a 
genetic network; and (b) shows a network example. In FIG. 2(b), A, B, C, and D are 
genetic expressions. 

FIG. 3 is a diagram showing an expression profile and its corresponding network. A wild- 
type expression profile is shown in (a); an example where there are three nodes is shown in 
15 (b); a B expression profile is shown in (c); and a C expression profile is shown in (d). 
FIG. 4 is an abbreviated flow chart of an example of processing in the embodiment. 
FIG. 5 is a diagram describing the fan-out limit of the network. Bonds between nodes are 
shown in (a). The relationship of the constraints and the probability of fan-out is shown 
in (b). 
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FIG. 6 is a selection processing flowchart and FIG. 7 is a diagram for describing a selection 
processing technique. In FIG. 7, n topologies, each different from the others, are 
randomly chosen, and the introduction of a plurality of triplets through parameter tuning 
using GA, SA, etc. for one of the topologies is shown. Each value of the parameter set is 
5 different (For example, P 0 ° Pq 1 Pq M )- 

FIG. 8 is a diagram describing a new search using SA. In FIG. 8 there are P triplets in 
the topology space and the portion within the slashed line is the portion which becomes the 
target of SA. 

FIG. 9 is a diagram for describing selection processing of candidates, and shows the case 

10 where the topology is structured from x nodes. 

FIG. 10 is a diagram for describing an example of total processing of fitness, and shows an 
example where the sum of all degrees of fitness is assumed as the fitness of each topology. 
FIGS. 11(a) and (b) are an example presented where SA is performed and a diagram 
showing the initial grouping. 

15 FIG. 12 is a diagram showing an example of GA application. Parameters of the topology 
are shown in (a); and are represented by a list of bond weights and threshold values 
(slashed portion). Formation of the individual grouping of the second generation through 
mutation of the individual grouping of the first generation is shown in (b). 
FIG. 13 is diagrams showing the performance of GA processing in detail. Crossover is 
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shown in (a) and mutation in is shown (b). 

FIG. 14 is a diagram showing an example of two-dimensional GA application, wherein an 
example of two-dimensional GA crossover is shown. A schematic of the example shown in 
the bond matrix of (a) is shown in (b). 
5 FIG. 15 is an example for describing a vicinity search using SA; wherein a state transition 
diagram is shown in (a), a graph of the change in state transition probability is shown in 
(b), and topology is shown in (c). 

FIG. 16 is a flowchart showing candidate triplet selection processing using mutation. 
FIG. 17 is a diagram showing an example of a genetic/metabolic network; wherein (a) is an 
10 example of codes used in a genetic/metabolic network, and (b) is an example of a network 
in which they are used. 

FIG. 18 is a diagram showing data used in a target profile; wherein (a) is an example of 
zero-dimensional space data, (b) is one-dimensional space data, (c) is two-dimensional 
space data, and (d) is three-dimensional space data. 

15 

PREFERRED EMBODIMENTS OF THE PRESENT INVENTION 

An embodiment of the present invention is described forthwith in detail while 
referencing the drawings. 

The present invention can be applied to cases of estimating many different 
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variables determining the structure of a genetic control network bringing about a certain 
biological phenomena or a metabolic network showing reactions of enzymes and proteins, 
and reaction strength and speed, using profiles of, for example, genetic expression data or 
protein concentration data occurring in such phenomenon. An example in this case is 
5 described below. 



Genetic Network 

An embodiment of the present invention is described using an example of a 
network showing genetic interrelations such as those shown in FIG. 2. FIG. 2(a) shows 

10 the relationship showing activation of expression of material expressing the genes as nodes, 
and the relationship showing inhibition. The figure showing this as a network structure 
is FIG. 2(b). In FIG. 2(b), A, B, C, and D are types of genetic expression. 

The genetic network for a given target profile (expression profile), can be 
represented by a "triplet", which has three units comprising: network structure (topology), 

15 which shows the reacting relationship between structural elements (DNA, RNA, proteins, 
etc.); a parameter set, which is a set of variables necessary when describing a model of 
reactions and states within the network; and degree of fitness, which shows similarity with 
the target profile. 

To begin with, an example of a genetic network structure is described. The 
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reaction rules between structural elements (nodes) can be simplified and represented by N 
x N bond weight matrix W (parameter set) and bond matrix C (network structure). The 
value w of bond weight matrix W takes a real number between, for example, +1.0 and -1.0, 
with a negative value for w showing inhibition, and a positive value showing activation. 
5 In bond matrix C, the two values 0 or 1 for element c pq shows the bond relationship 
between node p and node q. 0 means that there is no bond relationship between nodes, 
and 1 shows that there is a bond relationship. In addition, the expressional state of a 
genetic network structure including N nodes occurring at time t, can be expressed with N- 
dimensional space vector x(t). Each element of vector x is represented by x/t) (i = 1, N). 
10 If S t (t) is made the concentration of expressions of element i (i = 1, N), and h , is made 
the threshold value determining the activation level of active elements i, each of the 
elements can be defined, for example, in the following manner. This function allows 
various definitions. 




here : F(u) - < 



0 (u < 0) 

u (0 < u < 1) 

1 (u>\) 



1 5 Outline of Processing of Embodiment 



In the case of such a genetic network, given an expression profile, which is shown 
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as the changes in concentration of a plurality of expression materials over time, it is the 
object of the present invention to find a candidate genetic network expressing this 
expression profile. For example, the expression profile such as shown in FIG. 3(a) is given, 
and a candidate genetic network such as shown in FIG. 3(b) is to be found. It is noted 
5 that the expression profile shown in this FIG. 3(a) is an expression profile for a wild-type 
genetic structure. It is preferable that the first expression profile used be a wild type such 
as shown here. 

The numbers of the bond (link) portions of the network show the absolute values 
of values w of bond weight matrix W mentioned above. It is noted here that whether the 
10 sign of value w of bond weight matrix W is negative or positive is determined by whether 
the bond is activation or inhibition. In addition, a mutant profile where there is no gene B 
and a mutant profile where there is no gene C are respectively illustrated in FIG. 3(c) and 
FIG. 3(d). The manner in which this profile is used is described later. 

This embodiment is performed in accordance with the process shown in the 
15 abbreviated flowchart of FIG. 4. In FIG. 4, first, the structure of a network allowing the 
targeted expression profile to be provided is assumed, that network structure (topology) is 
generated, and topology pool 300 is formed (S102). 

In cases where the number of network structures assumed is small, all of the 
network structures assumed can be generated and stored in topology pool 300; however, in 
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cases where the number of network structures assumed is large, it is necessary to provide 
certain limits for the topologies to be stored in topology pool 300. 

It is noted that it is not always necessary for the topologies to be generated and 
stored in topology pool 300. If the topology space to be formed can be denned based on the 
5 network structure assumed, an algorithm creating an arbitrary topology from this defined 
topology space can be utilized to achieve the same result as taking a topology from topology 
pool 300. 

Parameters from the expression profile are adapted to each respective created 
topology using the degree of fitness (S104). The parameters are the values of the above- 

10 mentioned bond weight matrix W and a threshold value. In addition, the degree of fitness 
is the likelihood of these parameters (how much they match). This degree of fitness is 
computed using a performance function; and this performance function may be, for 
example, a least square error. 

As described earlier, the combination of parameters and degree of fitness with the 

15 network structure (topology) is referred to as a triplet (a three -point set of topology, 
parameters, and degree of fitness). A triplet resulting from processing is stored in triplet 
pool 400. When a topology is selected, parameters are adapted, and the degree of fitness 
is calculated, optimization is performed for a given expression profile using techniques 
such as GA (genetic algorithms) or SA (simulated annealing : an annealing method). 
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Up to this point is the stage of extracting specimens (triplets), which are objects to 
be processed. From this point on is the stage of narrowing down specimens (triplets) 
adapted for the objective from triplet pool 400. In this processing, for example, mutation 
analysis is utilized (S106). As a result of this processing, specimens (triplets) are 
5 narrowed down, allowing an objective group of candidate triplets (candidate triplet pool 
500) to be formed. 

Network Extraction Process 

Processing involved in generation of network structure (S102) establishes, for 

10 example, the maximum assumable number of nodes included in the network. The 
structure of that network may be generated without any prior knowledge, or it may be 
generated based on a network, the structure thereof being partially known. The 
generated network is stored in topology pool 300. The known structure of the network in 
this case may have, for example, fan-out limits. This is described in FIG. 5. Fan-out 

15 refers to the number of actions (activations, inhibitions) from one node to another node. 
FIG. 5(a) shows that there are four fan-outs from A to B through E. It is considered that 
the number of fan-outs has limitations, and as in the example shown in FIG. 5(b), these 
limitations are shown as a probability. In FIG. 5(b), it is shown that the probability of the 
number of fan-outs is maximized near 2, and the probability of the number of fan-outs from 
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5 and higher can almost be ignored. It is noted that these limitations can be defined in 
various ways. 

In addition, in cases where already identified bonds and un-bonds are known, a 
network structure is produced for portions except for such bonds. When the network 
5 structure is to be produced, it is possible to produce it by hypothesizing bond structures in 
some portions. Topologies can also be prepared by applying constraints to various 
structures in this manner. 

Whether the parameters and the degree of fitness are adapted to all of the 
network structures (topologies) stored in topology pool 300 and then stored in triplet pool 

10 400, or extracted and then stored, depends on the number of networks that can be 
generated. When there are only a few networks capable of being generated, more 
specifically, in cases where there are few network nodes, since it is possible to 
comprehensively process all of the possible networks, the parameters and the degree of 
fitness can be adapted to all of the generated topologies and stored in triplet pool 400. 

15 However, in cases where there is an immense number of networks to generate and the total 
number cannot be examined, topologies are selected, parameters and degrees of fitness are 
apportioned, and they are stored in triplet pool 400. 



Case of Many Topologies Existing 
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Processing occurring in cases where there is an immense number of topologies and 
topology selection is necessary are described forthwith in detail using the flowchart in FIG. 
6, and FIG. 7 and FIG. 8. 

Referring to FIG. 7 and the flowchart of FIG. 6, first, the number of topologies to 
5 be selected is determined. In this case, N topologies are said to be randomly selected. 
From topology pool 300 (or the topology space) of FIG. 4, one topology 310 is then randomly 
taken (S202). Expression parameters capable of achieving the given expression profile 
are then adapted to this using, for example, GA (genetic algorithm) or SA (simulated 
annealing) (S204). At this point, M (412, 414, 416, and 418) differing parameter sets, 

10 which have the highest degree of fitness with the given expression profile according to each 
expression parameter, are taken and stored in triplet pool 400. This is repeated for N 
topologies. During the repetition, in the step of randomly selecting a topology (S202), one 
of a structure differing from earlier chosen topologies is selected. At the stage where 
repetition is ended (YES at S206), N x M triplets are stored in triplet pool 400. 

15 It is noted that in the above, topologies are randomly selected; however, other 

statistical samplings may be utilized. In addition, when M differing parameter sets are 
taken, the degrees of similarity among parameter sets are calculated, and when those 
degrees of similarity are extremely close to one another (less than a certain threshold), 
they may be viewed as equivalent and not selected. Since it is pointless to select a 
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plurality of similar solutions, this means that only a representative value is selected. 

Next, N network structures (topologies), which are the N triplets stored in triplet 
pool 400 that have been reorganized by topology, are obtained through, for example, GA/SA 
(S208). As with the above, a parameter set is adapted to each of these (N + 1) to 2N 

5 topologies obtained using GA/SA, and except for the M triplets having high degrees of 
fitness with the expression profile, they are stored in triplet pool 400 (S210). As a result, 
(2N x M) triplets are stored in triplet pool 400. Of these (2N x M) triplets, P triplets 
having values at or below a predetermined threshold value are selected from among those 
having high degrees of fitness and are left in triplet pool 400 (S212). With this, there are 

10 now P triplets stored in triplet pool 400. The processing up to this point has been a global 
triplet search. 



Local Search 

The vicinities of the P triplets stored in triplet pool 400 are searched using, for 
15 example, SA (simulated annealing) to find a triplet having a higher degree of fitness (S214). 
This is shown in FIG. 8. FIG. 8 shows a search performed on topology space including the 
respective vicinities of each of the P triplets, which is a space of interest for SA. Through 
this, triplets having an even higher degree of fitness can be obtained. In cases where a 
triplet having a higher degree of fitness is found, it is also stored in triplet pool 400 (S216). 
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This results in P or more triplets being stored in triplet pool 400. 

It is noted that at this point, in the case where a triplet having a higher degree of 
fitness is found in the vicinity of another triplet, that triplet is replaced and the triplet 
having the higher degree of fitness can be stored in triplet pool 400. In this case, the 
5 number of triplets stored in triplet pool 400 remains at P. 

Here, searches are performed in the vicinities using SA; however, it is also 
possible for other techniques to be performed, such as the hill climbing method (a method 
where the most favorable point of the possible search points is selected as the search 
progresses). 

10 As a result, since the improved degrees of fitness of the triplets are stored in 

triplet pool 400, at the point in time where the sampling stage ends, probability is high 
that the targeted triplet is among the triplets stored in triplet pool 400. Nevertheless, the 
number of triplets at this stage is still not low enough for biological testing to be 
preformed. 

15 

Narro wing-Down Process 

Next is the stage where processing is performed to select an even smaller number 
of candidate triplets from the triplets stored in triplet pool 400. This processing is 
described forthwith using the example of FIG. 9 and FIG. 10, which uses mutation 
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analysis. 

In FIG. 9, a mutant triplet is produced for each triplet from triplet pool 400 (S402). 
The mutant triplet in this case is produced by eliminating (knocking-out) a gene, and 
removing all of the bonds from that gene. At this point, neither the other bonds in the 
5 triplet nor the parameters are changed. This is then performed for the genes (nodes of the 
network structure). Accordingly, of the nodes in the topology of a triplet, once each node of 
the respective x genes is knocked-out, it is possible to have x mutant pools 452 to 456 
storing the respective mutant triplets. Each degree of fitness with each respective 
mutant profile, which is the given target data, is then evaluated for each mutant pool 
10 (S404). Next, the degrees of fitness evaluated for each mutant pool are integrated, and 
ultimately, a candidate group having a degree of fitness above a certain value is chosen and 
stored in candidate triplet pool 500 (S406). In this case, the obtained candidate triplets 
may be sorted by degree of fitness, and referencing that degree of fitness, biological testing 
can be performed starting with the highest. 
15 Besides using the above-mentioned knocking-out for the mutation used here, any 

of, for example, heterozygote, over expression, or temperature sensitive mutation methods 
may possibly be used. For example, a heterozygote is a mutation that reduces the total 
amount of genetic expression by half by crushing one of the allelic genes of a specific gene 
locus on a homologous chromosome. In addition, over expression is a mutation which 
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over-expresses a certain gene. Temperature sensitive mutation is a mutation that uses a 
certain temperature sensitive material to halt genetic expression, or over-express, at a 
certain time point while taking measurements. 

5 Example of Calculating Fitness Integration 

This example of calculating fitness integration is described forthwith using FIG. 
10. FIG. 10 shows an example where the total of the degrees of fitness of each topology of 
all of the mutations (including non-mutant wild-types) are handled as the fitness of that 
topology. In this calculation example, a simple total is done; however, depending on needs, 
10 each mutation may be weighted and added. 

A mutant expression profile does not always need to be obtained for all of the 
mutant gene networks to be examined. In this case, only the obtained mutant expression 
profiles are used to calculate fitness and perform integration. In this manner, candidate 
triplets can finally be obtained. 

15 

Embodiment 

The above-mentioned processing is described forthwith in detail using a concrete 
genetic triplet example. A genetic network for achieving an expression profile, which is 
data in time series as shown in FIG. 3(a), is used for this example. 
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To begin with, coding of a chromosome is performed based on a bond matrix 
representative of the bond relationships of the topology. Since there are three expression 
materials shown in FIG. 3(a), the topology of a network with three nodes is represented by 
the following bond matrix C. This is shown in FIG. 3(b). 

f \ 1 (A 



C = 



0 0 1 
-1 0 o y 



5 Since there are bonds at four loci in this topology and it is necessary to weigh each 

of them, a chromosome having four genotypes is produced. Since it is also necessary to 
optimize the threshold value of each node, a chromosome which eventually has, in total, 4 
(number of bonds) + 3 (number of node threshold values) = 7 genotypes is produced. The 
range (real number values) of possible threshold values and chromosome weights in this 

10 case are as shown in FIG. 11(a). A plurality of such topologies capable of achieving the 
expression profile shown in FIG. 3(a) are produced, and stored in topology pool 300 (FIG. 4) 
Optimization of parameters (weights and threshold values) is performed for each 
produced topology using GA. N combinations of parameters of the range determined for 
the topology are first generated at random, making an initial group of chromosomes. FIG. 

15 11(b) shows the initial group for the topology determined by the above-mentioned bond 
matrix. In this manner, chromosomes are established for the topologies in a list format. 
Optimization is preformed by applying GA to this initial group. The degree of fitness of 
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each parameter is used as an index of this optimization. 

The degree of fitness of the initial group is obtained by generating the expression 
profile at each individual piece based on the parameters of the chromosome and 
performing, for example, total sum square error TSS between each target profile. This 

5 total sum square error TSS performs the following calculations. 

As shown in FIG. 3(a), a target profile is given as time series data of concentration. 
For this reason, the concentration of each expression material A, B, and C occurring at 
time t is represented as LTA(t), LTB(t), and LTC(t), and the concentration of each 
expression material, which is computed using the above expression (1) based on each 

10 parameter of the generated chromosomes can be represented as LEA(t), LEB(t), and 
LEC(t). The total sum square error TSS between these can be represented in the 
following manner: 

tss = if Zfc* (0-W)) 

X t=0 

The total sum square TSS found in this manner is made to be the degree of fitness of each 
generated network (chromosome). 
15 By using a degree of fitness found in this manner, it is possible to perform 

optimization for the parameters (bond weights and threshold values) of each topology 
using GA. This processing is described using FIG. 12 and FIG. 13 with an example where 
optimization is performed using GA crossover and mutation. 
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As shown in FIG. 12(a), the topology parameter is represented as a list of bond 
weights and threshold values. To begin with, an individual pair, which is to be the 
parents, is determined using, for example, a tournament strategy. This includes 
randomly selecting the number of individuals corresponding to the tournament size (for 
5 example, 2). The individual having the highest degree of fitness among the individuals 
selected is made one of the parents. By carrying this out one more time, the pair of 
parents is determined. As shown in FIG. 12(b) crossover is caused for this individual pair 
to generate the offspring parameters; moreover the generated parameters are then 
subjected to mutation; and the degree of fitness is then calculated with the generated 

10 parameters. A second generation of individual groups is obtained by performing this sort 
of processing until, for example, an established number of offspring are generated. 

Crossover and mutation processing is described in detail using FIG. 13. In FIG. 
13(a), portions 610 shown by the arrows of the parameters of Parent 1 and Parent 2 
represented in the list are swapped so as to generate Child 1 and Child 2. This is the 

15 crossover processing. FIG. 13(b) shows that one of the circled parameters (bond weight 
0.4) 620 of Child 1 is changed to a separate value (bond weight 0.6) 630. This change is a 
mutation and subjected to processing so that it occurs randomly at, for example, a fixed 
probability. 

By repeating this crossover and mutation processing, for example, for a decided 
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number of generations, parameters for a certain topology can be optimized. It is noted 

that an elite preservation strategy can be applied to all generations. The elite 

preservation strategy is a technique involving passing on the individuals having the 

highest fitness in the group to the next generation without causing any changes through 

5 crossover and mutation. 

Once the above processing has finished for all of the topologies, from this, topology 

sub group D+, which has a degree of fitness below the (in the case of this example, of the 

total sum square error) degree of fitness, is defined using the following equation. 
D + = {D + e D\TSS(D + )<T + ] 

The triplet (topology, parameters, and degree of fitness) of the extracted topology 
10 subgroup is stored in triplet pool 400. It is noted that optimal parameters are generated 
through GA for all topologies stored in topology pool 300 in this embodiment; however, in 
the case where the number of topologies stored in topology pool 300 is huge, the 
established number of topologies may be randomly selected and the parameters optimized 
for this may be generated using GA. Up to this point has been the global search stage. 
15 The triplets stored in this triplet pool are further subjected to optimization using 

another GA technique, adding the resulting triplets. The GA applied here is a two- 
dimensional GA. This two-dimensional GA is described in FIG. 14. FIG. 14(a) is an 
example represented by a bond matrix, and the schematic thereof is shown in FIG. 14(b). 
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As shown in FIG. 14(a), the portions shown by the cross shape in the Parent 1 and Parent 2 
bond matrix (that is, the bond relationship centered on B; see FIG. 14(b)) are subjected to 
crossover, and Child 1 and Child 2 are generated. In this manner, unlike with crossover 
on the list (parameters) described in FIG. 13(a), crossover targeting the physical structure 
itself of the network is performed. Such processing is useful for optimization of a two- 
dimensional structure. Parameters for topologies obtained by applying two-dimensional 
GA are found by optimizing using GA similar to that mentioned earlier. This is executed 
for a decided number of generations. Then as described above, only triplets having a 
fitness grade of T+ or lower are further stored in triplet pool 14. It is possible to also 
apply an elite preservation strategy in the additional GA processing following this. 

SA (simulated annealing) is executed on triplets stored in triplet pool 400, the 
topology space in the vicinity of each topology is searched, and parameters optimized for 
the obtained topologies using GAin the same manner as earlier are found. The reason for 
performing searches in this vicinity is because the local searching ability of GA is inferior. 

The basic SA algorithm used is as follows using a pseudo-code. This is the 
portion applied to each topology. 

MAIN0{ INITIALIZEQ 



WHILE (DOES NOT SATISFY STOPPING REQUIREMENT)! 
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WHILE (DOES NOT REACH THE STATE OF EQUILIBRIUM) { 
GENERATE NEXT STATE 0 
IF (ACCEPT (E, E\ T)) 
RENEW 0 



T= REDUCE (T) 



The E within the above ACCEPT (E, E', T) is a variable defining a state called 
10 energy in SA, and in this case, uses the degree of fitness of each network. E' is the degree 
of fitness of the triplets generated during "GENERATE NEXT STATE 0" processing. In 
addition, in the SA algorithm T is a temperature parameter, and in this case, is a variable 
defining a state where only this processing is used. Variable T is set to an initial value 
T mit when processing begins. In addition, stopped state and balanced state can use, for 
15 example, the processing count. 



"ACCEPT (E, E', T)" and "REDUCE (T)" are defined in the following manner. 

fl E<E 

\e<E-niT Qther 
REDUCE(T) = y T (0<y<\) 



ACCEPT(E, E\ T) = 



The processing represented by the above pseudo code is described forthwith. 
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Once the topology, which is the object of the processing, is selected, variables T and E are 
initialized. Then, processing is repeated a certain number of times within the defined 
loop, and to begin with, processing called "GENERATE NEXT STATE 0" is performed. 
This "GENERATE NEXT STATE 0" performs the following processing. 

Based on the topology to be processed, topology with a changed bond relationship 
is generated. This change is based on the state transition diagram shown in FIG. 15(a), 
and the topology to be processed is changed according to probability P. This transitional 
probability is defined as the following using the above-mentioned variable T. 

f 77 T init T 1 T init < Threshold Value 

[Threshold Value Other 

Here, T is the above-mentioned variable and the threshold value is set to, for 
example, 0.5. The reason why the threshold value is introduced is because if the 
transitional probability is made very high, it would be necessary to search many changed 
topologies against the targeted topology, which is not a vicinity search. Since T changes 
at "REDUCE (T)", which is located on the outside loop, the state transition probability does 
not change within the inside loop. FIG. 15(b) shows how state transition probability 
changes with the outside loop steps. In this case, y is, for example, 0. 8. Changes in 
the topology are described with the example of topology shown in FIG. 15(c). This is the 
same as the topology in FIG. 3(b). It is noted that the case where state transition 
probability is 0.5 is described. In FIG. 15(c), since the bond from A to A is activation- 
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bonded, in the state transition diagram of FIG. 15(a), it is in activation state 606. From 
this state, only state transitions to un-bonded 604 are possible. The probability of a state 
transition occurring is 0.5. Next, if the bonds from A to B are looked at these are also 
activation -bonded, in the same manner, with a 0.5 probability, so that they may change to 
5 an un-bonded state. Since the bond from B to B is in an un-bonded state, there is a 0.25 
probability of state transition to an inhibition state and to an activation state each (total of 
0.5). In the following manner, all of the bond relationships are changed based on the state 
transition diagram of FIG. 15(a) and the probability in FIG. 15(b). 

In this manner, for each obtained topology, which is the result of each bond 

10 relationship forming the target topology changed with the probability mentioned above in 
accordance with each transition diagram, optimum parameters are next determined using 
the above-mentioned GA, and the degree of fitness of the parameters is calculated. This 
generates a new triplet. At this point "GENERATE NEXT STATE 0" is completed, and 
the next process "ACCEPT (E, E', T)" is executed. 

15 With this "ACCEPT (E, E', T)", the degree of fitness (E) of the triplet having the 

target topology, and the degree of fitness (E*) of the generated topology are compared, and if 
the degree of fitness of the generated topology is low (well adapted), 1 (true) is returned. 
In addition, even if the generated fitness is high (not well adapted), 1 (true) is returned 
with a certain probability: In this case, when T is high (when there are not very many 
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process times), the probability of 1 (true) being returned increases. When "ACCEPT (E, 
E\ T)" returns a 1 (true), "RENEW 0" processing is performed, and the triplet generated by 
"GENERATE NEXT STATE 0" is renewed. 

This continues until the state of equilibrium inside the loop is obtained (in this 
5 case until the number of processing times have been performed). When processing has 
been performed the predetermined number of times, the outside processing ("REDUCE 
(T)") is performed, a new T is found, and, using this T, inside processing is performed. In 
this case, outside processing is also performed, for example, the predetermined times. 

Then when the fitness of the triplet finally reaches T+, it enters the triplet pool. 
10 At t he time this stage ends in the above-mentioned manner, the vicinity of each 

topology in triplet pool 400 is searched and the replacement triplet having a more 
favorable fitness is stored. The processing ultimately selecting the candidate triplet from 
this triplet is described forthwith. This uses the expression profile of various mutations 
and calculates fitness with it. This processing is described using the flowchart of FIG. 16. 
15 The profile as shown in FIG. 3(c) and FIG. 3(d) is first prepared according to the 

mutation. FIG. 3(c) and FIG. 3(d) are a mutant profile where gene B is missing and a 
mutant profile where gene C is missing. The respectively corresponding mutant pools are 
then prepared and all of the triplets within the above-mentioned triplet pool are stored in 
each mutant pool (S802). 
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The triplet topologies are made into mutant topologies for the triplets stored in 
each mutant pool, and using the parameters as they are, the degree of fitness with the 
corresponding mutant profile is calculated (S804). 

Out of the triplets in each mutant pool, those triplets with a calculated degree of 
fitness at or below a predetermined threshold value established for each mutation are 
selected (S806). Those selected triplets within each mutant pool, which are triplets 
common in all mutant pools and have a degree of fitness at or below a predetermined 
threshold level, are stored in a candidate triplet pool (S808). 

With this, it is possible to decide upon appropriate candidate triplets. 

Other Applications 

In the above description, an example of a genetic network is described; however, it 
can also be applied to, for example, a gene tic/me tab olic circuit network. FIG. 17 shows a 
network of a genetic/metabolic circuit. As shown in FIG. 17(a), genes, metabolic enzymes, 
and protein are given as nodes with the relationships between activation, inhibition, and 
mediation represented as links. A network using these is shown in FIG. 17(b). The 
processing described above can be used for the network structure shown in FIG. 17(b). 

In the above, a genetic network and a metabolic circuit network are shown as a 
target network; however, it can also be applied to a neural circuit network as the target 
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network. The data (target profile) in this case would require the use of neuron activation 
potential. 

In this manner, it is possible to use the techniques of the present invention for 
various objects. Various examples of applicable target profiles are shown in FIG. 18. 

5 . FIG. 18(a) shows the case where the target profile is zero- dimensional space data. 
Examples of this data include data which fluctuates in terms of time or frequency. For 
example, this may be concentration, degree of activity, or electric potential. FIG. 18(b) 
shows one-dimensional space data. This data is of x space and of an amount dependent 
on time or space. This amount may be a concentration, degree of activity, or electrical 

10 potential. The target profiles shown in FIGS. 18(c) and (d) are two-dimensional space 
data and three-dimensional space data. These are data of an amount dependent on two 
dimensional (x, y) or three-dimensional (x, y, and z) space, and time or frequency. This 
amount may be a concentration, degree of activity, or electrical potential. As shown in 
FIG. 12, it is possible for a target profile to be given in various data formats. 

15 l n this manner, the present invention is effective in cases where there exists a 

target profile, and estimation of the network structure of that target profile is being 
attempted. 

The present invention may be structured from not only a stand-alone computer, 
but is also applicable with a cluster of computers, for example a client/server system. 
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The structure of the present invention can be achieved by reading a program from 
a recording medium stored with the program related to the present invention and 
executing it with a system. Such a recording medium includes DVD, MD, MO, floppy disk, 
CD-ROM, magnetic tape, ROM cassette, etc. 

5 

Industrial Applicability 

As described above, the present invention is extremely effective in cases where 
there exists data to be used as a target profile, and estimation of the network structure 
yielding that target profile is being attempted. 
10 In particular, it can be effectively applied to model estimation in molecular biology 

such as with a genetic network or metabolic network. 
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CLAIMS 

1. A network estimation method, which estimates candidate networks that are 
descriptive of relationships between interrelated elements as a network and that, when 

5 data generated by said elements is given, are capable of reproducing data from said data 
given; said method comprising the steps of: 

obtaining a plurality of candidate networks by producing a network structure, 
which may allow for reproduction of said data given, and corresponding parameter set; and 
narrowing down an appropriate candidate network from said obtained networks 
10 using other data that differs from said obtained data and that can be generated from said 
networks. 

2. The network estimation method mentioned in Claim 1, wherein said step of 
obtaining said plurality of candidate networks comprise a step of producing all of the 

15 network structures capable of reproducing said data given. 

3. The network estimation method mentioned in either Claim 1 or Claim 2, wherein 
said step of obtaining said plurality of candidate networks comprise a step of selecting a 
network structure capable of reproducing said data given. 
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4. The network estimation method mentioned in Claim 3, wherein said step of 
obtaining said plurality of candidate networks comprises a step of producing a network 
structure that reproduces, at a high frequency, data having a low margin of error with said 
data given. 

5 

5. The network estimation method mentioned in either Claim 3 or Claim 4, wherein 
said step of obtaining said plurality of candidate networks comprises a step of producing a 
network structure by reorganizing portion of the networks that reproduce data having a 
low margin of error with said data given. 

10 

6. The network estimation method mentioned in any one of Claims 3 through 5, 
wherein said step of obtaining said plurality of candidate networks comprises a step of 
producing a network structure by searching the structural vicinities of an arbitrary 
network. 

15 

7. The network estimation method mentioned in any one of Claims 1 through 6, 
wherein said step of obtaining said plurality of candidate networks estimates one 
parameter or a set of a plurality of parameters different from one another for an identical 
network structure. 
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8. The network estimation method mentioned in Claim 7, wherein said step of 
obtaining said plurality of candidate networks comprises a step of estimating parameters 
using a genetic algorithm (GA), simulated annealing (SA), and/or an optimization 
technique such as the hill-climbing method. 

5 

9. The network estimation method mentioned in any one of Claims 1 through 8, 
wherein said other data is data generated by a network, a portion of which is mutant. 



10. A network estimation apparatus, which estimates candidate networks that are 
10 descriptive of relationships between interrelated elements as a network and that, when 

data generated by said elements is given, are capable of reproducing data from said data 

given; said network estimation apparatus comprising: 

first memory means for storing a network that configures a network structure and 

corresponding parameter set; 
15 second memory means for storing a network structure and corresponding 

parameter set, as a final candidate; 

means for producing a network structure, which may allow for reproduction of 
said data given, and corresponding parameter set, obtaining and storing in said first 
memory means a plurality of candidate networks; and 
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means for narrowing down and storing in said second memory means an 
appropriate candidate network from networks stored in said first memory means using 
other data that can be generated from a network and that differs from said obtained data. 

5 11. Recording media, stored with a program capable of providing a network 
estimation method mentioned in any one of Claims 1 through 9 for a computer system. 
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ABSTRACT 

Topology pool 300 is produced by generating network structures (topologies) 
potentially being able to reproduce a target profile (S102). Best parameters to be able to 

5 reproduce the target profile are determined for the respective generated network 
structures, and the margin of error thereof is calculated (S104). Triplets that have been 
combined from the parameters and the margin of error for the topologies are stored in 
triplet pool 400. Up to this point is the sampling stage and from this point is the stage 
where the triplets in triplet pool 400 adapted to the objective are screened; mutation data, 

10 for example, is used for this screening (S106). Candidate triplet pool 500 is achieved once 
the screening stage has ended. 
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